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We consider the statistical properties of photon detection with imperfect detectors that exhibit dark counts 
and less than unit efficiency, in the context of tomographic reconstruction. In this context, the detectors are 
used to implement certain POVMs that would allow to reconstruct the quantum state or quantum process under 
consideration. Here we look at the intermediate step of inferring outcome probabilities from measured outcome 
frequencies, and show how this inference can be performed in a statistically sound way in the presence of 
detector imperfections. Merging outcome probabilities for different sets of POVMs into a consistent quantum 
state picture has been treated elsewhere [K.M.R. Audenaert and S. Scheel, New J. Phys. 11, 023028 (2009)]. 
Single-photon pulsed measurements as well as continuous wave measurements are covered. 

PACS numbers: 03.67.-a,42.50.-p,42.50.Ct 



I. INTRODUCTION 

Estimating quantum states and processes plays an increas- 
ingly important role in quantum engineering as it allows for 
an unambiguous verification of the generation and manipula- 
tion procedures applied to a quantum system. Amongst the 
plethora of reconstruction methods, only few are capable of 
specifying error bars associated with the reconstruction pro- 
cess itself. We have recently developed a Kalman filtering ap- 
proach to quantum tomographic reconstruction fT] based on 
Bayesian analysis employing a linear Gaussian noise model. 

In Ref. lU] we have dealt with quantum state and process re- 
construction from tomographic data obtained by perfect mea- 
surements. In optical tomography, for example, this corre- 
sponds to the assumption that detectors are perfect and detec- 
tor counts represent photon counts faithfully. In reality, how- 
ever, optical detectors are not perfect and exhibit dark counts 
and losses (less than unit efficiency). In addition, mode mis- 
match in the detector connection may lead to further losses. 

In the context of tomographic reconstruction these imper- 
fections have important consequences. The detectors form 
part of an implementation of a POVM {H^^^ E^^^ , . . . , E^'") }, 
with which one endeavours to estimate, say, a quantum 
state p. The measurements consist of frequencies g = 
{gi,g2, ■ ■ ■ , 9k) for each outcome i = 1,2, . . . , K. Each of 
these frequencies corresponds to a probability pt ~ Tr pE*^*' . 
To estimate p, one essentially first estimates the probabilities 
p = {pi,p2, ■ ■ ■ ,pk) from the frequencies g. In the context 
of Bayesian inference, the estimation procedure yields a prob- 
ability distribution for p given the measured g. Indeed, only 
in the limit of an infinite number N of measurements do the 
relative frequencies g/N tend to the probabilities p. For finite 
N, p cannot be known with perfect certainty, and, hence, must 
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be described as a random variable with a certain distribution. 
Bayesian inference tells us what this distribution should be. 

In Ref. lUt] we have shown how knowledge of this distri- 
bution can ultimately lead to a reconstruction of the state in 
terms of a probability distribution over state space; one thus 
obtains a confidence region, rather than a single point in state 
space, as in maximum-likelihood methods. The basic tool for 
this reconstruction is the Kalman filter equation. It requires 
as input the first and second moments of the distribution of p 
inferred from g, for the various POVMs used in the tomogra- 
phy. 

Detector imperfections are important in this respect be- 
cause they have an impact on the inferred distribution of p. 
The measurement mean z taken in by the Kalman filter has to 
reflect losses and dark counts. Equally important is that im- 
perfections lead to additional measurement fluctuations which 
have to be accounted for in the measurement covariance ma- 
trix 0. 

In this article we present a statistically sound method 
for incorporating detector imperfections in the reconstruction 
scheme. One of the main design goals is practicality, and 
speed, without sacrificing statistical accuracy too much. In 
particular, we want to avoid lengthy numerical calculations at 
all costs, excluding any method that reeks of Monte Carlo. To 
this purpose we aim at finding exact formulas for the required 
quantities, or if that is impossible, we introduce several ap- 
proximation methods to reduce the computational complexity 
of finding the quantities numerically. 

The article is organised as follows. We present a mathemat- 
ical model for an imperfect detector in Sec. In Sec. Hill we 
treat the first case of optical detectors used in a setup where 
the optical beam consists of timed single-photon pulses. The 
continuous wave setup is treated in Sec. |V] We also study, 
in Sec. IIVI how one can incorporate imprecisions in the pa- 
rameters that describe the detector imperfections, dark count 
rate and efficiency. We conclude with a brief overview of the 
main results obtained, in Sec.|Vll Appendix lAl is devoted to 
a numerical method for calculating certain integrals that are 
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needed for the calculation of the moments of the distribution 
of p. In appendix|Bj we gather the necessary definitions for a 
number of special functions and special distributions that are 
used extensively in the paper. A number of implementation 
notes are also given. 



n. MODELLING THE PHOTON DETECTION PROCESS 

In this section we present a physical model for an imper- 
fect photon detector and review how the statistical properties 
of such a detector comes about, for further reference. We as- 
sume throughout that the detector operates in Geiger mode, 
so that photon detection consists of single-detection events, 
as opposed to linear mode where an opto-electrical current is 
produced. 




Background 



FIG. 1: Model of an imperfect detector. 

In the theory of quantum detection, an imperfect detector 
exhibiting dark counts is modeled by a compound detector, 
consisting of a perfect detector set in one of the outgoing arms 
of a beam splitter. This beam splitter mixes incoming light 
fields with a background radiation field (see Fig. [Til. The effi- 
ciency r] of the actual detector is modeled by the transmission 
coefficient |Tp of the beam splitter The background radia- 
tion field is assumed to be coupled to a thermal bath, and is 
best described as a multi-mode field. 

Under the additional and well-justified assumption that the 
number of modes in the background field is much larger than 
the number of photons, coupling between background modes 
and incoming modes can be ignored (see, e.g. Refs. |19|] and 
llltl P- 681). Under this assumption, the background photon 
distribution is approximately Poissonian. We will assume that 
the mean value of the number of dark counts per measurement 
interval is known, a value denoted by a. Thus, the number 
of dark counts per measurement interval is a random variable 
R ~ P(a). 

Given this physical model, the detection statistics can be 
derived as follows. The conditional probability that the detec- 
tor produces m counts given that n photons are present in the 
incoming field and r photons in the background field is given 
by i8D 

fM\N,R{fn\n,r) = fM\N,R{m-r\n,0) 
n 



77™-'^(l -77)"-"+^ (1) 



where the binomial coefficient is taken to be whenever 
r > m 01 rn — r > n. Since under the given assumption the 
background photon distribution is approximately Poissonian, 
we set R ~ P(a) and obtain 



fM\N{m\n) = 



r-O ■ 



(1-77) 



n — rii-\-r 



(2) 

For a given photon number distribution of the incoming 
light field, /Ar(n), the distribution of the photon counts is 



fM{m) 



n=0 



fM\N{m\n) fN{n). 



(3) 



One verifies easily that if the incoming light field is Poisso- 
nian, N ^ F{iy), with fN{n) = exp(— the distribu- 
tion of M is Poissonian also, M ~ P(a + ijv), as expected. 

If the incoming light field is in a Fock state, with either 
71 = or rt = 1, the formulas reduce to 



fM\N{m,0) = e " — 



for n — (no input photon), and 



fM\Nim, 1) = 



V 



to! (to — 1)! 



(4) 



(5) 



for n — 1 (single input photon). With short laser pulses, one 
usually only wants to discriminate between to = and to 7^ 
(let alone that further discrimination is at all possible). Hence 
one is only interested in 



/m|a'(0,0) = e-", 
(no click, no input photon), and 

/M|jv(0,l) = e-"(l-77), 



(6) 



(7) 



(no click, 1 input photon), and their complementary values. 
Usually, a is rather small, and one can set e"" w 1 — a. 



III. SINGLE PHOTON PULSES 

In this section we treat the case of a pulsed laser beam, 
where each pulse consists of a single photon. The statistics 
of the detection events are governed by the binomial or multi- 
nomial distribution. We treat three different setups. First, a 2- 
outcome POVM where only one detector is used; the second 
detector, for the second outcome, is left out on the assumption 
that the total number of detection events should be equal to the 
number of pulses anyway. For perfect detectors, this assump- 
tion is correct, while in the presence of detector imperfections 
this is only an approximation. We will study how this affects 
the detection statistics. Next, we treat a 2-outcome POVM 
with both detectors in use and compare it with the previous 
case. Finally, a /C-outcome POVM is considered, generalis- 
ing the K = 2 case. 
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A. Single Detector 

We first consider the most simple case of a 2-outcome 
POVM where only one detector is used. The tomographic ap- 
paratus, apart from the detectors, is hereby treated as a black 
box with 2 output terminals, one for each POVM element, and 
we assume that in each of the N runs, for a fixed setting of the 
POVM, a single photon appears at one of the output terminals. 
Losses in the tomographic apparatus itself are disregarded, 
because that is inessential for the derivation of the detector 
model. The tomography black box can thus be modeled by a 
2-dimensional probability distribution p = {p,l—p), where p 
represents the probability that the photon appears at terminal 
1 (see Fig.|2|l. 



Input 




probability q that the detector cUcks: 
q := Pr(click) 

= Pr(click|no photon)Pr(no photon) 
+ Pr(click|photon)Pr(photon) 

= a{l - p) + (l - p)p 
= a + (1 — a — P)p 
= a + "fp, 

where in the last line we defined 7 as the slope of the q versus 
p curve, 7:=! — a — /3 = (1 — a)r]. 

From this probability, one directly obtains the probability 
that in N runs g clicks are counted given the probability p 
of an incoming photon. Obviously, g should be an integer 
between and N. The conditional probability distribution of 
the count G, conditional on P, is just the binomial distribution 
Bin(iV; q) with probability distribution function (PDF) 

fG\p{9\p) - - (10) 

2. Statistical Inference 



FIG. 2: Model of a 2-outcome POVM where only one detector is 
used. 



Terminal 1 is then connected to a detector with dark count 
rate a and efficiency 77, while terminal 2 is left open; this cor- 
responds to the cheapest implementation of a 2-outcome de- 
tector. The record of an iV-run experiment consists of the 
number of times g the detector has chcked. 



1. Statistical model 

We first derive the statistical properties of the random vari- 
able G, whose observations are the recorded photon count g. 
Its distribution is conditional on P and depends on the param- 
eters a and rj. The standard procedure is to first derive the 
conditional probabilities of a detector clicking or not clicking 
conditional on a photon coming in or not. These are given by 
(cf. SecUD: 



Pr(l|0) = Pr(click|no photon) = a, 

Pr(0|0) = Pr(no cHck|no photon) = 1 - a, 

Pr(0|l) = Pr(no click|photon) = /?, 

Pr(l|l) = Pr(click|photon) = 1-/3. 



Here we have introduced the attenuation factor (3 as 
/?:= (l-a)(l-77). 



(8) 



(9) 



From the general formula (fTOl l describing the statistical be- 
haviour of an imperfect detector we can derive the likelihood 
function Lp^Q that is needed for the Bayesian inference pro- 
cedure. It is immediately clear from Eq. ( fTOl i that the likeli- 
hood function of a + {1 — a — (3) P will be proportional to 
the PDF of a beta-distribution with parameters a = g + 1 and 
b ^ N — g + 1. To that we can add some prior information: 
P is restricted to the interval [0, 1]. This implies that the beta- 
distribution of a + 7P will have to be truncated to the interval 
[a, 1-/3]. 

The moments of this truncated beta-distribution are given 
by (with E[X] denoting the expectation value of a random 
variable X) 



mi 



m2 



B{a,l- p,g + 2,N 



B{a,l-f3,g + l,N-g + iy 
:= E[(a + 7P)2] 

B{a,l- p,g + 3,N - g + 1) 



(11) 



(12) 



B{a,l-p,g + l,N~g + l)- 

Here, B{xo, xi, a, h) is the generalised incomplete beta func- 
tion. In actual numerical computations, it is better to use the 
regularised incomplete beta function Ixo,xi (a, b). Exploiting 
the relation B{a + 1, b)/B{a, b) — a/{a + b), we then get 



Using these conditional probabilities we can calculate the 



mi 



m2 



Ic,,i-p{g + 2,N -g + 1) 
Ia,i-f3{g + l,N ~ g + 1) 
Ia,i-f3ig + S,N - g + 1) 



Ia,i-fiig + l,N - g + 1) 

g + 1 
iv + 2' 

(.9 + l)(g + 2) 



{N + 2)(N + 3)' 



(13) 
(14) 
(15) 
(16) 
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where the first factor in Eqs. ( fTsT l and ( fT4] i is a correction term 
that goes to 1 when a and /3 tend to 0, that is, for ideal detec- 
tors. From these expressions, the central moments of P can 
then be calculated as 



1 — Of — p 



a\P\G = g) 



7712 ~ ^1 



(17) 
(18) 



Figure [3] shows a plot of /i [Eq. dTTI ll as a function of g/N 
for a few values of A^. As could be expected, for sufficiently 
large N, the curve for ji approaches a piecewise linear curve 
with ^ = for < g/N < a, and ^ = 1 for 1 - /3 < g/N < 
1. 




FIG. 3: Plot of ii{P\G = g) [Eq. ^ji] as a function of g/N for 
iV = 10, 100, 1000, and 10000, and values of a = 0.1 and /3 = 0.2. 

Figure m singles out the case iV — 100 and depicts the val- 
ues of the first and second central moments, /i and a. 




FIG. 4: Plot of /i(P!G = g) [Eq. ^ji] (black, central curve) and 
cr{P\G = g) [Eq. idU] (depicted as the grey curves /i ± ct) as a 
function of g/N for iV = 100 and values of a = 0.1 and /3 = 0.2. 



3. Discussion 

A common way to deal with dark counts and non-unit de- 
tector efficiency is to subtract the dark count rate a from the 



relative count frequencies g/N, replacing negative numbers 
by if necessary, and then divide by 1 — a — /3, replacing 
numbers higher than 1 by 1, if necessary. In other words, one 
would use formula (T7\ with g /N in place of mi, and truncate 
the outcome to the interval [0, 1]. 

We argue that there are two distinct problems with this ap- 
proach. First, as we have aheady argued in Ref. [1], for given 
g, the inferred distribution of P is a beta (Dirichlet) distribu- 
tion, not a binomial (multinomial) distribution. Considering 
the extremal case g/N < a, the above method would assign 
to the probability P, which amounts to claiming that the out- 
come can never happen (except for dark counts). Of course, 
never having seen an event does not imply that the event is 
impossible. Indeed, the correct approach, using the beta dis- 
tribution, assigns non-zero mean and variance to P. Second, 
as can be seen from Figs. [3] and 21 the actual behaviour of the 
statistically correct inferences for P vary smoothly with g and 
the truncation mentioned above is only correct in the N ^ 00 
limit. 



B. A 2 -outcome experiment with 2 detectors 

In this Section, we consider the situation where a single 
photon can take one of two paths (with probability p and 1—p, 
respectively), and subsequently impinges on one of two detec- 
tors, each set along one path (Fig.lSj. 



Input 
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1 




1-p 


2 






FIG. 5: Model of a 2-outcome POVM where 2 detectors are used. 



1. Statistical Model 

Let detector i be characterised by a dark rate ai and an 
attenuation factor /3i. Concerning the presence of the pho- 
ton at the detectors, there are two exclusive events: event 
10, where the photon is at detector 1 and not at detector 2, 
or event 01, where the photon is at detector 2 instead. Con- 
cerning the detectors clicking, there are 4 events: 00, 10, 01 
and 11, corresponding to no detector clicking, only detector 1 
clicks, only detector 2 clicks, or both detectors are clicking. 
We stress again that we are considering single-photon exper- 
iments, hence the latter case of both detectors clicking would 
typically correspond to one detector detecting the photon just 
mentioned while the other detector is producing a dark count. 
With perfect detectors such an event would not occur. 

The corresponding conditional probabilities are easily cal- 
culated. Let PT{ij\kl) denote this conditional probability. 



5 



where i = 1 iff detector 1 clicks, j = I iff detector 2 clicks, 
k = 1 iff a photon is at detector 1, and I — 1 iff a photon is 
at detector 2; hence, k + I ~ 1. Because the two detectors 
are independent, we have Pr{ij\kl) = Pi{i\k)Pr{j\l), where 
Pr(-|-) is the single-detector conditional probability (O of the 
previous section. 

Combined with the probability of the photon events 10 and 
01 being Pr(A: = 1) ^ p and Pr(^ = 1) = 1 — p, this gives the 
probabilities of the click events: 

900 = p/3i(l-a2) + (l-p)(l-ai)/32, (19) 

901 = p/3ia2 + (l-p)(l-ai)(l-/32), (20) 
qw = p(l - /3i)(l - as) + (1 - p)ai/32, (21) 
qn - p(l-/3i)a2 + (1- (1 -/32). (22) 

The probabilities of the corresponding event frequencies goo, 
9oi, 9io and gii, counting over N runs, is given by the multi- 
nomial distribution 



/gip — 



N 

500, ffoi, ffio, 5ii 



„9oo „9oi „gio „9ii 

yoo yoi yio yii ■ 



(23) 



Note that if one does not distinguish between single click 
events and two-click events, one is capturing the sums goi + 
gii and gio + gii, in which the 2-click events are counted 
twice. This causes mathematical difficulties in the statistical 
inference process that are best avoided. 

One may actually discard the multiple-click events alto- 
gether, and only record the single-click events gi := 510 and 
.92 ■— .901- This means that one makes no distinction between 
goo and gn. The corresponding distribution is again multino- 
mial, but now given by 



/gi 



G2IP 



N 

51,52,50, 



9!o9o?(9oo + 9ii)' 



(24) 



withc/o = ^^-51 -52- 

In the special case that both detectors are identical, i.e. 
when they have the same dark count rates and attenuation fac- 
tors, ai — a2 — a, and /3i — P2 — P, we find that the third 
factor 900+911 reduces to the constant I3{l — a)+a{l— (i), in- 
dependent of p. Then, considered as a function of p, /gi .G2|p 
is proportional to the binomial PDF (^^^^^)9io9oi, with 



gio = {l-p)a[3+p{l-a){\- 13), (25) 
goi = pa/3+(l-p)(l-a)(l-/3). (26) 



Defining 



ai := a/?, 

a2 := (l-a)(l-/3). 



(27) 
(28) 



we have gio = ai + (02 - ai)p and goi = 02 - (02 - ai)p. 
Furthermore, by defining 



ai 



ai + 02 



(29) 



we find that gio = (ai + 02) [a + (1 — 2a)p] and goi = (^i 
a2)[a+(l-2a)(l-p)]. 



Thus, the PDF /gi,G2|p proportional to the truncated bi- 
nomial PDF: 



/gi,G2|P 



5i + .92 
51 



[a+(l-2a)p]9i [a+[l~2a){l--p)]a\ 

(30) 

This PDF is essentially identical to the PDF ( flOl l obtained in 
the previous section, apart from the fact that the dark count 
rate a and the attenuation factor (3 only enter in the PDF via 
the single constant a. This constant assumes the role of an 
effective dark count rate and is given by 



ai 



a(3 



ai+a2 (1 - a)(l - /3) + a/3' 



(31) 



One sees that a is of the order of a/3, which is a smaller num- 
ber than a and /3. More precisely, we have a/3 < a < 2a/3. 



2. Statistical Inference 

In general, the statistical inference formulas become quite 
complicated, because in the expression for /gi,G2|p more 
than 2 factors appear that have a dependence on p. The subse- 
quent integrals over p can no longer be expressed as (incom- 
plete) beta functions. In this section we treat the easiest case 
of all detectors being equal, and use the PDF ( [30] l, which only 
has two factors. As this PDF is essentially identical to the 
PDF ([Tol l obtained in the previous section, the same results 
therefore hold for the statistical inference. 

We can therefore use formulas (fT3])-(fT8]), provided we per- 

a and 



form the substitutions g 
13 ~* a. This gives 



51, -> + 52, a 



mi (a) 
TO2(a) 

"^1.0 
"^2,0 



/a,l-a(5l+ 2,52 + 1) 



^, 1-0(51 

-fa, 1-0(51 



1,52 

3, 52 



mi,o, 



"1-2.0, 



Ia.l-a{gi + 1,.92 + 1) 

51 + 1 
51+52 + 2' 

(gi + l)(gi + 2) 
(gi+.g2 + 2)(.gi+52 + 3) 



and 



/.(P|G- (.91,52)) = 
a2(P|G = (51,52)) = 



1 -2a ' 

ra2 — ra\ 
{l-2af 



(32) 
(33) 
(34) 
(35) 

(36) 
(37) 



The main difference between Eqs. (|32]|-(|37]| and Eqs. (fT3Tl- 
(fTSl l for the single-detector case is the replacement of a and 
1 — /3 as limits of the incomplete beta functions by a and 1 — a, 
where a is the effective dark count rate given by Eq. ( [3T] l. 



3. Discussion 

We can compare the performance of the two setups, one de- 
tector or two detectors, by comparing the average value of a 
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of the reconstructed distribution of p, for a given value of the 
actual p. In the 1 -detector case, G is distributed according to 
Eq. ( [Tol l. For given actual p, one calculates the average of <t as 
given by Eq. ( fTSl l over this distribution. In the 2-detector case, 
Gi , G2 are distributed according to Eq. dSOl l. and one similarly 
calculates the average of a as given by Eq. dJTl i. Taking, as 
in Fig.O a = 0.1, (3 = 0.2 and N = 100, we find, for var- 
ious settings of the actual p, the values collected in Tab. |T] It 



p 


(t(1 -detector) 


cr(2-detectors) 





0.033 


0.044 


0.5 


0.070 


0.088 


1 


0.04 


0.044 



TABLE I: Average values for a for various settings of p, comparing 
the 1 -detector and 2-detector cases. 

emerges that the one-detector case performs slightly better on 
average. Presumably, this is because for the 2-detector case 
we only used single click events to keep the inference pro- 
cedure simple. That is, the sum gi + 92 is always less than 
N. With the given parameter settings, the average value of 
.91 + .92 is 50 (for any p). However, if one makes more mea- 
surement runs for the 2-detector case, stopping when gi + g2 
is equal to the number of runs for the 1 -detector case, the 2- 
detector setup performs better (Tab. Hill. In Figs. |6]and|7]we 



p 


(7(1 -detector) 


(7(2-detectors) 





0.033 


0.017 


0.5 


0.070 


0.052 


1 


0.04 


0.017 



TABLE II: Average values for a in the 1-detector and 2-detector pro- 
tocols under the additional constraint that gi + g2 equals the number 
of runs for the 1-detector case. 

show what happens to Figs. [3] and |4] for the 2-detector setup 
(under the constraint gi + 52 = 100). The plateaus around 
fi — and /i = 1 are indeed much shorter In addition, the 
error bars (quantified by a) are smaller by a factor of roughly 
1 / (corresponding to an on average increase of by a fac- 
tor of 2). 



0.8 
0.6 
0.4 
0.2 



0.2 



0.4 



0.6 



0.8 



g/N 



0.8 
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0.4 
0.2 



0.2 
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0.6 



0.8 



g/N 



FIG. 7: Plot of ti{P\G = (5, TV - g)) and a{P\G =ig,N^ g)) 
(depicted as the grey curves /izba) as a function of g/N for A'^ = 100 
and values of a = 0.1 and f3 = 0.2. 



4. Unequal Detectors 

In the more realistic case that detector parameters are not 
equal, we need to calculate integrals of the form 

J{g;a,b)= / dp T\iai + b.,py\ 
Jo 

with more than 2 factors. Indeed, the mean of P can be calcu- 
lated from 



Ji = E[ai+biP] 
and its variance from 

J2 = E[(ai+5iP)2] = 



J(g + e^;a,b) 
J{g;a,b) 

J{g + 2e^;a,b) 
Jig:a,b) 



where e* denotes the unit vector along the i-th dimension, 
= (0,...,1,...,0). Hence, 



[Ji - a,)/bi 
{J2 - ' 



2aibifip)/bl - /ip. 



The actual integrations can be performed numerically using 
standard quadrature methods (e.g. Matlab's built-in quadl 
routine). 

To enhance numerical robustness for higher values of = 
J2i for which the integrand is sharply peaked, it is advis- 
able to reduce the integration interval and only integrate over 
that subinterval of [0, 1] where the integrand is higher than, 
say, 10^*^ times its maximal value. This refinement allows the 
quadrature algorithm to better place its quadrature points. 



C. A iC-outcome POVM with K Detectors 



FIG. 6: Plot of fi{P\G = {g,N - g)) as a function of g/N for 
N = 10, 100, 1000, and 10000, and values of a = 0.1 and /3 = 0.2. 



Here, we generalise the results of Sec. HUB I to the case 
where there are K detectors, each one corresponding to one 
of the outcomes. No detector is missing. The tomographic 
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apparatus is now treated as a black box with K output termi- 
nals, one for each POVM element. To keep the calculations 
for the statistical inference transparent, we restrict ourselves 
to the case of identical detectors throughout. 



1. Statistical Model 

Again we assume that in each of the N runs, for a fixed set- 
ting of the POVM, a single photon appears at one of the output 
terminals. The tomography black box is now modeled by a K- 
dimensional probability distribution p = {Pk)k=i^ where pk 
represents the probability that the photon appears at terminal 
k. 

Each terminal is then connected to a detector with dark 
count rate a, efficiency 77, and attenuation factor f3. The 
record of an iV-run experiment consists of the frequencies 
gk, k — 1, . . . ,K, the number of times the fc-th detector has 
clicked and none of the others has. As discussed before, we 
leave out events where more than one detector clicked, in or- 
der not to increase the mathematical complexity. 

We now derive the statistical properties of the vector G = 
{Gk)f^i, whose observations are the recorded photon counts 
g. Its distribution is conditional on the probability vector p 
and depends on the parameters a and [3. 

Let Qk denote the probability of the event Ek that detector 
k clicks and no other We again first calculate the conditional 
probabilities of Ek, conditional on the photon appearing at 
terminal j. For j — k, this conditional probability is (1 — 
- /?); for j 7^ fc it is af5{l - af^-^. 

The probability of event Ek is then given by 



qk 



(1 - - a)(l - P)pk + - pk)] 

ai + (a2 - aijpk 



{K - l)ai +02 
= a + (1 - Ka)pk, 



(38) 



where ai and 02 are defined as before, and the effective dark 
count rate a is defined as 



a := 



ai 



[K- l)ai +02 



(39) 



For the A^-run experiment, the probability of the vector of 
frequencies g = (gi, 32, ■ • ■ , Sk) is therefore proportional to 
the truncated multinomial distribution 



fG\p{9\p)^{ \\{[a+{l-Ka)p 
\9i, ■ ■ ■ ,9k/ 



(40) 



2. Statistical Inference 

From the general formula ( |40] l describing the statistical be- 
haviour of a bank of imperfect detectors we immediately de- 
rive that the likelihood function £p|G is given by 



L 



P\G 



1 ^ 



(1 - Ka)pi 



(41) 



where Af is the normalisation integral, given by the integral of 



-(1-A'a)pi]5 



- ( 1 — Ka)pK] ^'^ over the probability 



simplex pk > 0, X^/cP'^' ~ 1- ^^^^ integral is quite hard to 
calculate, and so are the integrals that are required to calculate 
the moments of Lp^Q. Denoting 



rk 



(1 - Ka)pk, 



(42) 



we get that the random vector R — . . . , Rk) is dis- 
tributed according to a truncated Dirichlet distribution, where 
Rk is subject to the condition Rk > a. 

No analytic expression is known for the integrals involved; 
among the numerical methods to calculate them are numer- 
ical integration, the Gibbs sampling method (a Monte Carlo 
method) [4], and saddle-point approximations [5J. Since for 
neither method commonly available software seems to exist, 
we give some more details about the latter method in Ap- 
pendix [A) where we calculate the normalisation integral of 
the truncated Dirichlet distribution 



J{a;a) := P{R > a) 



drfuir), 



for R ^ Dir(a), where as usual cx = g + 1. The first and 
second order moments about the origin of R can be expressed 
in terms of this integral as 



E\Rf 



E[R^Rj] 



J{a 



e';a) oj^ 
ao 



J{a; a) 
J{a + 2e';a) ai(aj + 1) 
J{a;a) ao(ao + 1) 



(43) 
(44) 
(45) 



J(q:; a) Q;o(ao + 1) 

The moments of P then follow easily from Eq. (|42] |. 

Note, however, that this calculation requires K+\+K{K+ 
l)/2 separate integrations, which can be computationally very 
expensive for larger values of K. For relatively small values 
of a, say a < 0.1, the following provides a moderately good 
approximation: 



K 



(46) 



with Ii^a{ctQ ~ Oli, ai) the regularised incomplete beta func- 
tion [see Eq. dBllI )]. Numerical experiments indicate that this 
approximation is good enough for the calculation of the sec- 
ond order moments of R for values of a as large as 0.1. This 
has been checked for K = 3; with a — 0.1, the approxi- 
mated second order moment differs less than 5% from its ac- 
tual value. Similarly, the first order moments are accurate to 
within O.lcr for a < 0.05. 

The worst case figures appear for extremal values of G, 
i.e. all gi = bar one. Although the relative error for these 
extremal values increases with N, in practice however, these 
extremal values will hardly ever occur, exactly because of the 
presence of dark counts, as indicated by a. Therefore, given a 
and N, we first find the minimal value of the gi that can sen- 
sibly occur and then calculate the relative error for that point. 
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Since G is distribut ed as a trunc ated multinomial one should 
take gi > Na — 2 y^Na{l — a). The relative error for points 
within these boundaries is then less than 0.1a, independently 
of iV. 

We have compared the speed of three methods to calcu- 
late/approximate the moments of P. The calculations have 
been done in Matlab, with the routines for the incomplete beta 
and incomplete gamma function replaced by proprietary C im- 
plementations (available from lllOi] ). Method 1 is the saddle- 
point method combined with one numerical integration (see 
appendix), method 2 is the saddle-point method combined 
with analytical integration of a Taylor series approximation 
(see appendix), and method 3 uses approximation (|46] |. For 
K ^3, a ^0.1 and a = [10, 10, 50], method 1 took 142ms, 
method 2 10ms, and method 3 1.7ms, on an Intel Core2 duo 
T7250 CPU running at 2GHz. Method 1 is the most accurate, 
and method 3 the least. 



IV. DEALING WITH PARAMETER IMPRECISION 

In the previous section we have assumed that the two main 
parameters a and f3 (dark count rate and attenuation factor) 
are known exactly. In realistic situations, however, a and (3 are 
also of a statistical nature, for a variety of possible reasons, in- 
cluding instability of the parameter (drift), imprecision of the 
measurement of the parameter, or plain infeasibility of direct 
measurement. The second best thing to an accurate value for 
a parameter is then a statistical description in terms of a PDF 
or, at the very least, in terms of its mean and central moments 
(variance, and maybe even the skewness). 

In this section we show how this statistical uncertainty 
about the parameters can be included in the inference process. 
For simplicity of the exposition, we will assume that only one 
parameter exhibits imprecision. The general case follows eas- 
ily. 

Suppose, as usual, that we want to obtain an estimate of 
the random variable P and of its variance from measurements 
of G, using the likelihood function ip|G,y y), where 
?/ is a parameter that is described by a random variable Y, 
with given mean variance and possibly higher order mo- 
ments. 

We will assume that the PDF of Y is close to normal, 
namely continuous, single mode, small skewness and kurtosis 
close to the normal value of 3. Almost all of the probability 
mass of Y is then contained in the interval [ji — Scr, /i + Scr] . 
PDFs of this kind can be well approximated by a so-called 
Edgeworth expansion ifTll [T2I1 . A second order Edgeworth 
PDF is just the normal PDF with the given mean and variance: 



1 



27rCT 



A third order Edgeworth PDF adds another term, which con- 
tains the skewness 7. For a standardised random variable 
(zero-mean and unit variance) this PDF reads 



where 4> is the standardised normal PDF (j){y) = 
exp(-2/V2)/V2^. 

Recall that if Y were known perfectly, we would need to 
calculate only the following: 



E[P] 



Iq dp p Lp\Gip\g,y) 
Jo dpLp\Gip\g,y) 

Jo dpp'^ Lp\G{p\g,y) 

Jo <dpLp\G{p\g,v) 



i.e. 3 integrals in total. Since, however, Y enters as a nuisance 
parameter, we must also integrate out Y, taking into account 
the PDF of Y . Hence we need three double integrals, which 
we would like to avoid for efficiency reasons. 

The method we will employ to simplify these calculations 
is to first perform the integration over p (analytically or nu- 
merically, depending on what is possible), then approximate 
each such integral by a polynomial of low degree (3 or 4) in y, 
(this is the idea behind the Newton-Cotes integration formu- 
las) and finally perform the integration over Y analytically, 
with a low-order Edgeworth PDF substituted for the PDF of 
Y. 

To obtain a polynomial approximation we will use La- 
grange interpolation. Let yi be m equidistant points within 
the interval [ji — 3cr, ji + 3cr] (with m equal to 3 or 4), say 
y, = i6, with i = -1,0,1 or i -3/2,-1/2,1/2,3/2. 
Then any function h{y) can be approximated by a polynomial 
h{y) given by Lagrange's interpolation formula 



%)-E%^) n 



y - Ui 
yk-yi 



The integration over Y can now be done analytically, pro- 
vided we choose a low-order Edgeworth PDF for Y . For 
TO = 3 (degree-3 interpolation) and choosing a normal PDF 
for Y yields 



dy fY{y)h{y) 



= i^Ky-i) + (1 - ^)^(yo) + i^Kvi)- 

Hence, if we set 6 = a, this formula simplifies to 



dy fviy) Hy) = [Hi^ -^) + Kii + (j)]/2. (47) 



Hence, only two evaluations of h are needed, i.e. two integra- 
tions over p. As this has to be done for the numerator and 
denominator of E[P] and of E[P^], this gives a total of 6 inte- 
grations. For example, the formula for /ip becomes 



E[P] 



JI dp p L{p\g, - cr) + J^ dp p L{p\g, ^i + a) 
Jo dp L{p\g, /i - cr) + /q dp L{p\g, ^i + a) 



fy.Ay) = 4>iy) - U"'iy) = [i - 7(3 - y')y/my), 



For TO = 4 we can include the skewness 7 of y - it cancels 
out for TO = 3 - by choosing a third-order Edgeworth PDF for 
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Y. When we put S = 2cr, so that the whole ±3(T interval is 
covered, we get in a similar way as before 



dy /y(y) h{y) 
-^MM-3<T) + (i + ^)M/i~^) 



(48) 



This now involves 4 evaluations of h, hence 4 integrals over 
P- 

As a final remark, note that one can place bounds on the 
values of a parameter from the measurement statistics. To 
illustrate this, consider a run of TV 2-outcome pulsed exper- 
iments, with unknown dark count rate, where the number g of 
outcomes ' 1' is very low compared to N. Intuition has it that 
the dark count rate must be small accordingly. The likelihood 
function for P is (see Sec lIIIBl i 



[« + (1 - 2a)pna + (1 - 2a)(l - p)]^-^ 



with effective dark count rate a. Since g is small, this places an 
upper bound on the value of a. In effect, a has to be described 
by a random variable, and Lp contains that random variable. 
By integrating out P from L p, we obtain a distribution for a. 
The exact result is that the PDF of a is proportional to 

/(a) oc / dp[a + {l-2a)py[a+{l-2a){l-p)f-B 
Jo 

_ B{a,l - a,g + 1,N - g + 1) 
1 - 2a ' 

Rather than using the exact result here, one notes that the 
integrand of the second integral is proportional to the PDF 
of a beta distribution and therefore /(a) is essentially the 
cumulative distribution function (CDF) of the complemen- 
tary beta distribution, a function decreasing with a. The 
PDF has mean value /i = {g + l)/(^ + 2) and variance 
cr2 = {g + l){N+l-g)/{N + 2)^{N + 3). Thus, /(a) will 
be significant only for values of a below /x + 3a. For small g 
and large N, we therefore get the promised upper bound on a: 



that individual photons can still be discerned. Photon counts 
are recorded during that same time interval T. The statistics 
are now governed by the Poisson distribution. 

Note that the Poisson distribution is the limiting case of the 
binomial distribution for the number of runs N going to in- 
finity, while the total duration T and the photon rate (aver- 
age number of photons expected during T) are kept constant. 
Therefore, in principle, there should be no essential difference 
between the statistics of this kind of experiment and those of 
the single-photon experiments. However, in CW experiments, 
the intensity of the laser beam enters as a parameter, requiring 
determination. While this determination is possible by per- 
forming independent measurements, a less time-consuming 
approach is to use the actual measurements one is interested 
in. This approach will be described in this section. 

We will assume again that the dark count rate a is known 
exactly. The detector attenuation factor /3 will not show up 
explicitly as it is assumed to be absorbed into the (unknown) 
laser beam intensity. 



A. Statistical Model 

We consider a CW experiment consisting of K runs of 
equal time duration T, and constant but unknown laser inten- 
sity. In each run a different 2-outcome POVM {II^*' , 1 -n^') } 
is applied, but only the counts gi corresponding to II'*) are 
recorded, as was the case in Sec. IIII Al We assume that 
^.nW = 61. The general case, in which Y.. n(*) is not 
a multiple of 1, has been treated (without dark counts) in 
Ref. 1 1]. The purpose of this section is only to show how dark 
counts can be added to the statistical model. Non-unit detec- 
tor efficiency has already been incorporated in the treatment 
of Ref. 1 1] implicitly, by absorbing rj in the beam intensity A. 

As stated in Sec. [Ill for Poissonian input and background 
fields, the counts are Poissonian too, with mean value ^ = 
a + rjv, where a is the dark count rate and v the input photon 
rate. For beam intensity A, and POVM element 11^*^ we have 
V — Api, thus jii ^ a + rjApi. Henceforth, we absorb rj into 
A, thus fii — a + Api. In addition, since H^'^ = 61, we 
have Y^iPi = ^■ 

As the counts gi are independent, and each is Poissonian 
with mean fjLi, the PDF of the sequence of counts G = 
{Gi,G2, ■ ■ ■ ,Gk) is given by 



K 



a< {g + l + 3^gTT)/N. 



(49) 



/G(9)=n' 



9i 



K 



(X e 



-Ab 



Y[{a + Api 



V. POISSONIAN CASE 

In Sec. |III]we have treated a class of tomography experi- 
ments based on single-photon optical pulses, where the statis- 
tics of the recorded photon counts is governed by the bino- 
mial/multinomial distribution. In this section we treat contin- 
uous wave (CW) experiments. Here, the input laser beam is 
turned on for a fixed time T. The detectors are still operating 
in Geiger mode, and the intensity of the laser beam is such 



where factors have been left out that are independent of pi 
and A. In order to formally turn the quantities a + Api into 
a probability distribution, we divide by their sum ^f^iia + 
Api) = Ka + Ab, and define 



Ap^ 



Ka 



Ab 

y 

X 



y + il- Ky)pjb, (50) 

a/x, (51) 
Ka + Ab. (52) 
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Then the PDF of G is proportional to 



B. Statistical Inference 



K 



fc{g) oc 



T{N + l,Ka) 



l[{y + {l-Ky)p,/by^, (53) 



with N := J2t9i- The factor l/r(A^ + l,Ka) has been 
included to normalise the factor e^^'x^ over the interval x > 
Ka. The first factor is, indeed, the PDF of a truncated gamma 
distribution. 

The second factor is essentially the PDF for the single- 
photon case, with y assuming the role of the effective dark 
count rate. The main difference is that y is now a random vari- 
able. Indeed, as the variable A is an unknown, so are x and 
y. In Bayesian terminology, A is a nuisance parameter, and 
the standard Bayesian treatment is to integrate it out. That is, 
fcig) is multiplied by a suitable prior for A, and is then in- 
tegrated over A e [0, cx)]. The problem with this approach is 
that the integral cannot be carried out analytically. 

In what follows, we approximate the integral, based on the 
assumption that the number of total counts N = ^ ■ gi should 
be much larger than the expected total number of dark counts 
Ka, i.e. that the signal-to-noise ratio of the experimental data 
is large enough. This assumption is very reasonable given that 
one actually wants to obtain useful information from the data. 

The main benefit of this assumption is that the truncation of 
X can be disregarded. Indeed, as has been noted in Sec.|Bl the 
normalisation factor r(iV + 1, Ka) is well approximated by 
r(A^+l) when N > l + Ka+SVKa, and similar statements 
hold regarding the moments of the distribution. Thus the PDF 
of G is proportional to 



K 



fcig) (X 



m 



Y[[y+{l-Ky)pJbY', (54) 



where we now allow the random variable X to assume all 
values down to 0. The upshot is that to very good approxima- 
tion, X has a gamma distribution with mean (and variance) 
+ 1. The integral of fc{g) over x is thus a convolution 
of C{g) := Yii^iiv + (1 ^ Ky)pi/b]^' , which depends on 
X via y = a/x, with the gamma PDF of X. Note also the 
resemblance of Eq. ( |54] l to the corresponding Eq. ( |40] l for the 
iiT-detector single photon case, which is not all too surprising. 

A short calculation using the properties of X reveals that 
the variable Y = ajX has mean value /iy — a/N and 
variance erf, = a^/N^{N - 1). As the PDF of Y shows 
small but noticeable deviations from a normal distribution, we 
also need the skewness of Y, which turns out to be = 
Ay/N — 1/{N — 2). Recall that the skewness is defined as the 
third central moment of Y divided by the third power of cry ; 
for this distribution the skewness is roughly equal to two times 
Pearson's mode skewness, and can therefore be interpreted 
as how much the mean differs from the mode, expressed in 
halves of a standard deviation. For this distribution the mode 
ofFis a/{N + 2). 



We can now invoke the methods of Sees. IlllCl and II VI to 
perform the statistical inversion of C{g) with Y as an impre- 
cise parameter with the moments just mentioned, which de- 
pend on the dark count rate a (assumed to be known here) 
and on = J2i9i- regards the additional factor 1/6 in 
Eq. (l54l i. this can be taken into account by multiplying the ob- 
tained mean of P, E[Pi], by b and the second order moments 
about the origin, E[PiPj], by 6^. 

Finally, we can also treat the case where the POVM ele- 
ments do not add up to a multiple of the identity, i.e. when the 
assumption II^*) = 61 is not satisfied. This could occur 
because of inaccuracies in the implementations of the POVM 
elements, or simply because of the choice of elements - before 
Ref. ifT'] it was not known that failure to meet the condition 
^ - n'^'' = 61 had a severely negative impact on the ease with 
which statistical inferences could be made. The consequence 
is that the probabilities pi do not add up to a constant. Their 
sum po ■= ^f=iPi is now a random variable, too, and has 
to be treated as an additional nuisance parameter. This case 
has been treated, for the case without dark counts, in Ref. ul. 
Sec. 3.2.5, under the assumption that the deviation of J2i n^*^ 
from a scalar matrix is small. For larger deviations no accurate 
methods are known to us other than Monte-Carlo methods. 

The formulas obtained in Ref. JTI] carry over easily to the 
case with dark counts, because 6 simply enters as a factor in 
the formulas for the moments of P. Let M and m be the 
largest and smallest eigenvalue of ^ - II'*) . The multiplication 
factors for K[Pi] and ]E[PiPj] (the moments about the origin) 
are now, instead of 6 and 6^, M(j)i and M(/f)2, respectively, 
with 



K 1 - (m/M) 



K+l 



K+1 l-(m/M)^ ' 
K 1 - {m/M)^+^ 
K + 2 1 - {m/M) 



K 



(55) 
(56) 



VI. CONCLUSION 

In this paper, we have studied the statistical properties of 
photon detection using imperfect detectors, exhibiting dark 
counts and less than unit detection efficiency, in the context 
of implementations of general /^-element POVMs. We have 
derived a Bayesian inference procedure for obtaining distri- 
butions over outcome probabilities from detection frequencies 
in a variety of setups. We also obtained formulas and/or algo- 
rithms for efficiently calculating the first and second order mo- 
ments of these distributions, effectively obtaining estimates 
and corresponding error bars for the outcome probabilities. 

For experiments using single-photon laser pulses we have 
considered A' -element POVMs constructed with K detectors 
(with special emphasis on the case K = 2). We found that by 
far the easiest inference procedure occurred when only taking 
single-detection events into account (i.e. only counting events 
where just one out of K detector clicked). In that case, the out- 
come probabilities p are drawn from a truncated Dirichlet dis- 
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tribution cx YliLi [^+ (1 ~ Ka)pi]3' where gi are the detection 
frequencies and a is an effective dark count rate, which can be 
calculated from the actual dark count rate and the detection 
efficiency. For K ~ 2 the moments of this truncated Dirich- 
let can be calculated extremely rapidly using incomplete beta 
functions. For larger K we have devised a number of nu- 
merical algorithms for doing so, offering the user a trade-off 
between accuracy and speed. For K^2we also considered 
a setup with just a single detector, and found slightly different 
formulas for the distribution and its moments. 

While in the above one needs to supply values for dark 
count rate and detector efficiency, we have also devised a 
method for dealing with the case when these parameters are 
not accurately known. This method is particularly useful to 
deal with the final setup we have considered, namely when 
the experiments are done with continuous wave laser beams. 
In that case, the detection statistics is Poissonian and the in- 
ferred outcome probabilities are again drawn from a truncated 
Dirichlet, but now with the effective dark count rate being a 
random variate itself, due to the inaccurately unknown laser 
beam intensity. 

Finally, we also briefly considered how one can obtain an 
upper bound on the effective dark count rate, from the value 
of the minimal frequency of an outcome in any given run (or 
in a combination of runs). 
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The method proposed by Butler and Sutton consists of two 
basic ideas. The first idea is to use a conditional charac- 
terisation of X. Namely, one defines K new, independent 
random variables Zi such that X and Z\ Zi — 1 have 
the same distribution. It is known that one obtains the re- 
quired Dirichlet distribution if Zi has a gamma distribution, 
Zi ^ Gamma(ai, 1). For the purposes of the method, the 
value of the scale parameter 6 does not matter, and we set 
9 = \. The PDF is therefore given by 



r(a,) 



Now the required probability Pr(X > a) can be expressed, 
using Bayes' rule, as 

Pr(X > a) = Pr(Z >a\^Zi = l) 

i 

= Pr(^Z, = l|Z>a)nPr(^.>«)p;:(^-^— Yy- 

(A2) 

The factors Pr{Zi > a) are easily calculated in terms of the 
CDF of the gamma distribution, giving 



Pr(Z, > a) = Q(a„a), 



(A3) 



with Q{ai, a) the regularised incomplete gamma function. 

Since the Zi are independently gamma-distributed, Zi ~ 
Gamma(ai, 1), their sum is also gamma-distributed: J^i ^ 
Gamma(aoi !)■ The factor Pr(X)i = 1) is therefore given 
by the value of the PDF of Gamma(Q;o, 1) in 1, which gives: 



1/PrK^Z, = 1 =er(ao)- 



(A4) 



APPENDIX A: INTEGRALS OF TRUNCATED DIRICHLET 
DISTRIBUTIONS 

In order to calculate the moments of the truncated Dirich- 
let distribution, one must be able to accurately calculate the 
distribution's normalisation integrals. In this Appendix, we 
describe an approximation method due to Butler and Sutton 

a. 

Let X ^ Dir(a) be a Dirichlet distributed if-dimensional 
random variable, with parameters ex. ~ {ai, . . . , a/f ). This 
assumes that Xi> and -^i — 1 hold. We will use the 
common notation —J^i'^i- 

Let us now truncate X, by imposing the condition Xi > 
a, where < a < 1/ K. The goal is to calculate the new 
integration constant given by the probability Pr(X > a). We 
will denote this probability integral by J: 



J{ol; a) 



K ai-l 



dx r(ao) n ^ 



(Al) 



Note that for K = 2, this integral is given by the regularised 
incomplete beta function Ia,i-a{o.i, 0,2)- 



The first factor in Eq. dAll, the ti'uncated PDF Pi{J2^ = 
1\Z > a), is the hardest to calculate, because it is a multi- 
dimensional integral, and the second idea in Butler and Sut- 
ton's method is to convert it to an inverse Laplace integral of 
a univariate function, and then approximate the latter integral 
using a saddle -point method, as first proposed by Daniels |6]. 

The method starts from the moment generating function 
(MGF) of the truncated random variable T = J^i^ilZ > a, 
defined as Mt{s) = Erie'**]. Since the Zi are independent, 
we have 



(A5) 



where Ti Zi\Zi > a. A simple calculation gives 

rdt e^*i"'-ie-* 

J a 



J a 

T , (A6) 

Q[ai,a) 



which is valid for Sfis < 1 (and we do need complex s). The 
denominators cancel with the factors Pi{Zi > a) = Q{ai, a). 
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Since the MGF Mt{—s) is the two-sided Laplace trans- 
form of the PDF, the PDF can be recovered from the MGF by 
an inverse Laplace transform: 



2 /•7+ioo 



where, in our case, we only need to evaluate the PDF at the 
point i = 1. By expressing the MGF as the exponential of the 
cumulant generating function (CGF) Kt{s) := logMT(s), 
the path of integration can be brought in a form that readily in- 
vites the saddle-point method for its approximate evaluation: 



(A7) 



The path of integration is hereby chosen to pass through a 
saddle-point of the integrand, in such a way that the integrand 
is negligible outside its immediate neighbourhood. Daniels 
shows that in this case the path should be a straight line par- 
allel to the imaginary axis and passing through the saddle- 
point s, which is that value of s for which the derivative of 
Kt{s) — st w.r.t. s vanishes: 



K't{s) =t. 



(A8) 



Daniels showed that, under very general conditions, s is real. 
Hence, in Eq. ( IA7I ). one takes 7 = s, and the path of integra- 
tion is along points s = s + iy. 
An explicit formula for K!j,{s) is 



K't{s) 



(A9) 



with u — a(l — s) and g{a,u) = e~'^u"'' /r(ai,u). One 
shows that g{a,u) is roughly approximated by max[0,M — 
(a — 1)]; moreover, g{a,u) > max[0,u — (a — 1)]. An 
approximate value ofs — 1 — u/ais thus given by the solution 
of 



u 
a 



ao 



max[0, u — (ui — 1)] . 



(AlO) 



As the right-hand side is a piecewise linear function of u, the 
solution of this equation is easily found. This approximate 
solution can then be used as a starting value for numerically 
solving the exact equation 



= "0 



^g{ai,u). 



Once the optimal value s has been obtained, one can go 
about performing the integration in Eq. (|A7| l, i.e. of 



1 



/t(1) - - / n[MT{s + ty)e~^'+'y'>]dy, 



(All) 



where we have exploited the fact that the real part of the inte- 
grand is even in y. To obtain the highest accuracy, the integra- 
tion has to be done using a numerical quadrature (e.g. using 



Matlab's built-in quadl routine). The upper integration limit 
can be replaced by a finite value, equal to a fixed number times 
the approximate width of the function graph, which is roughly 
1/^X^(1), where 



ai 



(l-s)2r(a„a(l-s))2 

If speed is at a premium, while somewhat less precision 
is acceptable, one can use a finite-term Taylor expansion of 
Kt{s) — s, and integrate each of the resulting terms analyti- 
cally. The saddle-point approximation is obtained by writing 
Kt{s) — s as a Taylor series around s ~ s: 



00 

kt{s) - s = kt{s) - s + e -,K^r^\myy 

and expanding the integrand as 



J=2 ■ 



J=3 



Kr 



(5) 



X <! 1 _ + + ^tlX^y^ 



6 



24 



120 



K, 



(6) 



V 



72 720 
with each of the derivatives of Kt evaluated in s. 





10 15 20 25 30 35 40 45 50 



FIG. 8: Plot of J(qi, Q2; 0.1) as calculated using the second-order 
saddle-point method (blue, solid curve), and the absolute error, in 
units of 10^* (red, dashed curve), as compared to the exact result 
^0.1,0.9(0:1, 02). The sum qo = Qi + 02 is held constant at a value 
of 50. The maximal absolute error here is 2.4571 x 10~^ and the 
maximal relative error is 2.5189 x 10~^. 

Upon performing the integral dy the terms with odd 
powers of y vanish. After substituting Kij^y^ /2 = v'^, and 
using 

f +00 

e-''\^^dv = r(fc+ 1/2), 
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with r(l/2) = 0F, r(2 + 1 /2) 
ISa/tF/S, the even powers yield 



30F/4andr(3 + l/2) 



Ml) 



1 



1 K, 



(4) 



5 {K, 



8 [K'^Y 



24 (i^^)3 



(A12) 

(note that in the corresponding formula (7) in Ref. [5] a minus 
sign is missing). 

In Fig. [8] we give an example of the 2-dimensional integral 
J{ai, a2; a) calculated using this method and compare it to 
the exact result, which for K = 2 is known to be the regu- 
larised incomplete beta function Ia,i-a{c(i, 012)- The Matlab 
routines used to perform these calculations are available from 



APPENDIX B: MATHEMATICAL COMPENDIUM 

In this appendix we gather a few mathematical preliminar- 
ies that are necessary to understand the statistical models de- 
veloped in Secs. HlWl 



1. Special functions 

We start by collecting some important results on special 
functions and their implementations in various computer al- 
gebra software. 



a. Gamma function 



The gamma function T{a) is defined as the integral 



r(a) = / 

Jo 



(Bl) 



with r(fc) = (fc — 1)! for integer arguments. Since for 
large values of its argument, the gamma function becomes 
extremely large, numerical packages usually contain imple- 
mentations of the natural logarithm of the gamma function 
too (gammaln in Matlab, and LogGamma in Mathematica). 
We will need this as well. 

The gamma integral leads to two incomplete integrals, the 
lower incomplete gamma function 7(0;, x) and the upper 
incomplete gamma function T{a,x): 



r(a, x) 



(B2) 
(B3) 



Obviously, one has 7(0;, x) + r{a,x) = T{a). By divid- 
ing these incomplete gamma functions by the corresponding 
complete gamma, one obtains the regularised incomplete 
gamma functions: 



P{a,x) = 7(a,a;)/r(a), 
Q{a,x) = r(a,a;)/r(a), 



(B4) 
(B5) 



with P + Q = 1. 

In Mathematica, Gamma [a, x] is the upper incom- 
plete gamma function r{a,x), while Gamma [a, xo, xi ] 
is the generalized incomplete gamma function, so 
that 7(a,a;) — Gamma [a, 0, x] . The regularised 
incomplete gamma functions are implemented as 
Q{a,x) = GammaRegularized [a, x] and P{a,x) — 
GammaRegularized [a, 0, x] . 

In Matlab, P{ct, x) has been implemented as 
gammainc (x, a) (note the reversal of the arguments). 
Except in older versions, Q has been implemented too, as 
gammainc (x, a, ' upper' ) . 

The two basic expansions that are used in these calculations 
are the series expansion (see, e.g. Ref. lfl3ll . formula 6.5.29) 



fe=0 



r(a + fc + 1)' 



for a; < a + 1, and the continued fraction expansion (see, e.g. 
Ref. iQ, formula 6.5.31) 



Q(q;, x) 



r(a) 



1 1 



a 1 2 

I- x+ \ 



a 2 

I- a;+ 



for X > a + 1. Here we used the typographical notation for 
continued fractions: (^T'') ~ fcTc' where c stands for ev- 
erything that follows. For the other regimes one can use the 
formula P + Q = 1. If high accuracy is needed for extremely 
small values of P or Q, one should calculate the logarithm. 

b. Beta function 

The beta function B{a,b), a generaUzation of the gamma 
function, is defined as 



B{a,b) 



6-1 



It is related to the gamma function via 

r(a)r(b) 

T(a + bj 

This leads to the relation 

B{a + 1, b)/B{a, b) = a/{a + b), 



(B6) 



(B7) 



(B8) 



For integer arguments, one sees that -B(a, b) is related to the 
binomial coefficient as 



B{a,b) 



{a-mb-l)\ 
{a + b-l)\ 



ab 



'a+b\ 



Since, again, the natural logarithm of the beta function is 
usually implemented directly [in Matlab: betaln (a,b)], 
this formula allows evaluation of the binomial coefficients for 
larger values of the arguments than allowed by direct calcula- 
tion. 
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Just as in the case of the gamma function, replacing 
the integration limits yields the incomplete beta function 
B{x, a, b) and the generalised incomplete beta function 

B{x,a,b) = / dx x^-^l - x)''-^ (B9) 

^0 

B{xo,xi,a,b) = / dx x''-^! - x)''-^ . (BIO) 

Jxo 

Dividing by the complete beta function also gives the regu- 
larised incomplete beta function and the generalised regu- 
larised incomplete beta function 

I^{a,b) = B{x,a,b)/B{a,b), (Bll) 
Ixo,xtia,b) B{xo,xi,a,b)/B{a,b). (B12) 

In Matlab, only Ix{a,b) — Ia.x{a,b) and lx.i{0',b) = 
1 — Ix{a,b) are implemented, as betainc (x, a, b) 
and betainc (x, a, b, ' upper ') , the latter only 
in more recent versions, while in Mathematica all 
four functions exist, under the names Beta[x,a,b], 
Beta[xO,xl,a,b], BetaRegularized [x, a, b] 
and BetaRegularized [xO , xl , a, b] . Just as for the 
incomplete gamma functions one may need a logarithmic 
version of to cover cases with extremely small function 
values. 

Calculations are based on the continued fraction expansion 
of Ix, which is vahd for x smaller than {a — l)/(a + & — 2) 
(see, e.g. Ref. iQ, formula 26.5.8): 



Ix{a,b) 



x^il 



aB(a, b) 



1 



1+ 



1+ 



(B13) 



with 



d2m+l 
d2m 



(a + to) (a + b + m) 
(a + 2m)(a + 2m + 1^ 
to(6 — to) 



(a + 2m - l)(a + 2to) 



x. 



For larger x, one uses the relation Ii^^ib, a) = 1 — Ix{a, b), 
where the left hand side is numerically more accurate for 
small function values. In case the continued fraction expan- 
sion fails, one can still use certain approximations (see, e.g. 
Ref. Ijj, formulas 26.5.20 and 21). 



2. Poisson, Gamma, Beta and Dirichlet Distributions 

The probability distribution function (PDF) of a discrete 
random variable K that is distributed according to the Pois- 
son distribution, K ^ P(A), is 



fK{k) 



kl ■ 



(B14) 



We also recall a number of basic facts about several contin- 
uous distributions |2, 3]. The gamma distribution is directly 
related to the gamma function. The PDF of a random vari- 
able X that is distributed according to the gamma distribution 
X ~ Gamma(a, 9), with a the shape parameter and 6 the 
scale parameter, is given by 



fx{x) 



—x/9^a — l 

6l"r(a) ■ 



We will not need the extra freedom offered by 9, and we will 
always put 9 — 1, giving 



fxix) 



Via) 



(B15) 



For X = X and a = k+1, this PDF looks formally the same 
as the Poisson PDF. However, in the latter K is the random 
variable, rather than X. In effect, the gamma distribution and 
Poisson distribution are each other's conjugate. 

The cumulative distribution function (CDF) of X is the reg- 
ularised lower incomplete gamma function P: 



Pi{X > a;) = P{a,x), 



and its moments are given by 



fJ-x 



'X 



(B16) 



(B17) 



For not too small values of a, the bulk of the probability mass 
of the gamma distribution is roughly contained within the in- 
terval [ji — 3(7, ji + 3cr] = [a — iy/a, a + 3\/a] . This ex- 
plains why P{a,x) is very close to for x < a — "i-Ja 
and very close to 1 for (roughly) x > a + 3^/a- A more 
accurate statement is that for x > a + 2.8 + 3.09v^, or 
a < a; + 1.9 - 3.09y/x - 0.41, P{a, x) > 0.999. 

The Dirichlet distribution is the higher-dimensional gen- 
eralisation of the beta distribution. The importance of this dis- 
tribution stems from the fact that it is the conjugate distribu- 
tion of the multinomial distribution: if ~ Mtn(iV, p) is the 
distribution of F conditional on P = p, then using Bayesian 
inversion (starting with a uniform prior for P) P conditional 
on P = / is Dirichlet distributed with parameter /. Formally, 
the two distributions only differ by their normalisation. The 
multinomial distribution is normalised by summing over all 
integer non-negative / summing up to A^, while the Dirichlet 
distribution is normalised by integrating over the simplex of 
non-negative p summing to 1 . 

The general form of the PDF of a d-dimensional Dirichlet 
distribution with parameters ai is (see, e.g. Ref. iH], Chapter 
49) 



/p(p) = r(«o)n^, 



where ao is defined as 



Its mean and variance are both equal to A. 



d 

ao := ^ ai 



(B18) 
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The range of P is the simplex Pi > 0, ^ pi = 1. 
The mean values of the Dirichlet distribution are 



(B19) 



The beta distribution is the special case of a Dirichlet dis- 
tribution with d — 2. The normalisation factor is then the beta 
function B{ai , a2), from which the distribution got its name. 



and the elements of its covariance matrix are 



cti(ao—ai) 
"g("o + l) ' * 



ao("o + l)' 



J 



(B20) 
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